The differences between latent topics in abstracts and citation contexts of citing papers

نویسندگان

  • Shengbo Liu
  • Chaomei Chen
چکیده

of the cited reference. They have a similar topic about evolutionary model and mrbayes. Topic 2, topic 3, and topic 5 had less relationship with the abstract of cited reference. The comparison of topics in citing abstract and citing sentences. The similarity matrix of topics in citing abstracts and citing sentences are shown in Table 5 and Table 6. The topics in citing abstracts have an average cosine similarity of 0.044, which is much lower than in citing sentences. The average cosine similarity of topics in citing sentences is 0.369. Table 7 shows the similarity matrix between topics in citing abstracts and citing sentences. The most related topics are topic 2 in citing sentences and topic 6 in citing abstracts with a cosine similarity value of 0.21. They have the same topic, amino acid research. The cosine similarity values between other topics are very low. The average cosine similarity value between all the topics in citing abstracts and citing sentences is 0.038, which is lower than the cosine similarity in citing sentences and citing abstracts. Figure 6 shows the relationship between all the topics in citing sentences and abstracts based on their cosine similarities. The topics in citing sentences were shown with circle and the topics in citing abstracts were shown with square. The edges represented the cosine similarity of topics. The edges shown in the picture all had the weight greater than 0.15. Most of the topics in citing sentences were connected together except topic 5. There are fewer connections between topics in citing abstracts. We find that TABLE 1. Top 14 terms in each topic of citing abstract (noun phrases). Topic 0 genome sequence genetic analysis common ancestor protein sequence phylogenetic tree daphnia pulex comparative analysis nervous system convergent evolution model organisms related proteins different species resultswe report evolutionary change Topic 1 mitochondrial genome nuclear gene control region coding gene complete mitochondrial genome sequence molecular evolution mitochondrial dna codon usage main lineages genome sequence amino acid sister-group relationship phylogenetic resolution amino acid sequence Topic 2 evolutionary relationship genetic analysis morphological characters dna sequences nuclear gene phylogenetic inference bayesian analysis phylogenetic reconstruction phylogenetic signal sequence data molecular data morphological data phylogenetic hypothesis maximum parsimony criterion Topic 3 gene family gene duplication event expression pattern transcription factor gene expression important role amino acid sequence genetic analysis duplication event binding site sequence similarity common ancestor gene product drosophila melanogaster Topic 4 phylogenetic relationship sequence data secondary structure gene tree phylogenetic tree evolutionary process phylogenetic reconstruction speciation event bayesian inference analysis posterior probability phylogenetic analysis sequence alignment variable site evolutionary model Topic 5 gene cluster comparative genomic analysis teleost fish lateral gene transfer gene loss phylogenetic tree genomic analysis evolutionary scenario human genome housekeeping genes cell division sequence tags sequenced genomes first step Topic 6 positive selection amino acid model molecular evolution purifying selection adaptive evolution subtilisin-like serine protease functional divergence concerted evolution duplication event domain architecture amino acid site important role selective constraint positive darwinian selection Topic 7 complex history horizontal gene transfer land plants green algae green plants genetic analysis red algae phylogenetic evidence sequence evolution vertical inheritance evolutionary rates public database ribosomal rna eukaryotic genomes Topic 8 divergence time molecular phylogeny phylogenetic relationship molecular clock major clades rapid radiation fossil record monophyletic group phylogenetic framework old world deep divergence evolutionary history sequence data major group Topic 9 genetic diversity heuristic algorithm genetic structure gene flow mitochondrial dna range expansion genetic variation genetic subdivision arabidopsis thaliana phylogenetic history nuclear markers geographical distribution northern australia population structure JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—March 2013 633 DOI: 10.1002/asi there are some topics from citing sentences and citing abstracts that have strong connections, respectively, such as topic 2 in citing sentences and topic 6 in citing abstracts, topic 6 in citing sentences and topic 0 in citing abstracts. The results indicate that the topics in citing sentences and citing abstracts have some differences. The cosine similarities of topics identified from citing sentences and citing abstracts are very low. The topics in citing sentences are more tightly coupled than topics in citing abstracts. The breadth of a term was measured based on the concept of information entropy. First, 100 terms from each group were chosen based on their frequency. Second, the information entropy of each term was calculated. Then we compared the average values and median values of information entropy in each group. Figure 7 shows the results of the comparison. Figure 7(a) was the comparison of average information entropy. The average information entropy was getting lower when the number of terms grows. The average information entropies in citing sentences were lower than in citing abstracts all the time. This comparison was not based on all the terms, so Student’s t test was required to assess whether the means of two groups are statistically different from each other. We assumed two groups have equal variances and used one-tailed test. Table 8 listed 10 groups of t-test results based on different term numbers. The p-values were getting smaller as the number of terms grows. If the significance level set as p = 0.05, the difference between two groups were significant when the number of terms set from 40 to 100. As shown in Figure 7(a), abstract terms tend to be broader terms than citing sentence terms and the differences are statistically significant from 40 terms onward. We found the same results in the comparison of median values in two groups (Figure 7b). This comparison indicated that the information entropy of terms in citing sentences was lower than terms in citing abstracts in the data sets. So the breadth of terms in citing sentences was narrower than terms in citing abstracts. We present the results in Tables 9 and 10. Table 9 shows the information entropy of the top 14 terms in the abstracts. The values in this group are all higher than 0.4, and the average of these values is 0.56. Most of the terms in Table 10 represent a field or mention many research fields. The information entropy of terms in citing sentences is shown in Table 10. The average information entropy is 0.51. The term “mrbayes” in the citing sentences was removed from the citing sentences, because this term appeared in most of the citing sentences and has much higher frequency than other terms. Most of the terms in Table 10 are related to research methods which are special to some fields.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Investigating Lexico-grammaticality in Academic Abstracts and Their Full Research Papers from a Diachronic Perspective

Development of science and academic knowledge has led to changes in academic language and transfer of information and knowledge. In this regard, the present study is an attempt to investigate lexico-grammaticality in academic abstracts and their full research papers in Linguistics, Chemistry and Electrical engineering papers published during 1991-2015 in academic journals from a diachronic pers...

متن کامل

The online attention to certain nuclear medicine topics: An altmetrics study vs. a citation analysis

Introduction: Traditional citation analysis has been greatly criticized because the process of citation accumulation requires considerable time after publication. So, the term “altmetrics” was proposed in 2010 to measure the scientific and social impact of a paper.We performed a search for certain nuclear medicine topics using the altmetrics approach to report the correlation b...

متن کامل

Joint Modeling of Topics, Citations, and Topical Authority in Academic Corpora

Much of scientific progress stems from previously published findings, but searching through the vast sea of scientific publications is difficult. We often rely on metrics of scholarly authority to find the prominent authors but these authority indices do not differentiate authority based on research topics. We present Latent Topical-Authority Indexing (LTAI) for jointly modeling the topics, cit...

متن کامل

مطالعه کمی و کیفی مقالات فصلنامه تحقیقات اطلاع‌رسانی و کتابخانه‌های عمومی از سال 1391-1387

Purpose: The aim of this study was to examine 18 issues of the journal of Information Science and Public Libraries. Methodology: This is a survey- descriptive study with a bibliometric approach. Citation analysis,  one of the most popular bibliometric techniques, is used. The research population includes 18 issues of the Journal of Information Science and Public Libraries from the beginning of...

متن کامل

Exploiting Citation Contexts for Physics Retrieval

The text surrounding citations within scientific papers may contain terms that usefully describe cited documents and can benefit retrieval. We present a preliminary study that investigates appending citation contexts from citing documents to cited documents in the iSearch test collection. We examine the effect on information retrieval performance of a range of citation context sizes and their v...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • JASIST

دوره 64  شماره 

صفحات  -

تاریخ انتشار 2013